Skip to content

Allow interaction with ray client service via Route from outside of OCP cluster #100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jun 22, 2023

Conversation

tedhtchang
Copy link
Member

@tedhtchang tedhtchang commented May 11, 2023

How to test:

# clone the branch
git clone --single-branch --branch ray-loca-interaction https://github.com/tedhtchang/codeflare-sdk.git
cd codeflare-sdk/

# create python virtual env with [pyenv](https://github.com/pyenv/pyenv#automatic-installer)
pyenv install 3.8.13
pyenv virtualenv 3.8.13 localinter
pyenv local localinter

# Install python packages
pip install -r requirements.txt
pip install -r requirements-dev.txt
pip install jupyterlab
pip install -e .

# Run a sameple noteook and then Run -> Run all cells
jupyter lab demo-notebooks/interactive/local_interactive.ipynb

# clean up
pyenv uninstall localinter
# delete the cloned repo

Expect something
image

@tedhtchang tedhtchang force-pushed the ray-loca-interaction branch 2 times, most recently from 5f65ec2 to d6175c4 Compare May 13, 2023 00:29
@tedhtchang
Copy link
Member Author

@MichaelClifford I passed all the tests. looks good. I also renamed the example notebook.

@tedhtchang tedhtchang requested a review from Maxusmusti May 31, 2023 22:53
@MichaelClifford
Copy link
Collaborator

Thanks for adding more tests @tedhtchang

LGTM

@tedhtchang tedhtchang changed the title [WIP] Allow interaction with ray client service via Route from outside of OCP cluster Allow interaction with ray client service via Route from outside of OCP cluster Jun 6, 2023
jbusche
jbusche previously approved these changes Jun 13, 2023
Copy link
Contributor

@jbusche jbusche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you Ted!
Note, for testing, there were two items that were different from what was listed:

  1. Had to install pip install -r requirements.txt in addition to the requirements-dev.txt in order to get openshift installed in my pyenv
  2. The notebook file is named local_interactive.ipynb

@jbusche
Copy link
Contributor

jbusche commented Jun 19, 2023

@tedhtchang, with the latest code with the Ray Operator v0.5.0, the worker is now able to start all 3 init containers and that's working ok. But when I try to do the import ray step I'm getting:

File ~/.pyenv/versions/3.8.13/envs/localinter/lib/python3.8/site-packages/ray/util/client/worker.py:260, in Worker._connect_channel(self, reconnecting)
    252 if log_once("ray_client_security_groups"):
    253     warnings.warn(
    254         "Ray Client connection timed out. Ensure that "
    255         "the Ray Client port on the head node is reachable "
   (...)
    258         "more information."
    259     )
--> 260 raise ConnectionError("ray client connection timeout")

ConnectionError: ray client connection timeout

@tedhtchang
Copy link
Member Author

Tested on my own mac and another Ubuntu VM from scratch. Both env seems to work. @MichaelClifford @Maxusmusti Could you test again?

@jbusche
Copy link
Contributor

jbusche commented Jun 20, 2023

I tried one last time last night and local interaction worked on my cluster to default namespace from my Mac. I'm not certain why it worked this time, but perhaps a new cluster and/or some local changes on my Mac... not sure.
Was also trying to test from an Ubuntu image, but I ran out of time trying to setup my Ubuntu with the proper prereqs. Last thing Ted wanted me to try was a different namespace other than default. Will post update here once my cluster rebuilds and I can test.

@jbusche
Copy link
Contributor

jbusche commented Jun 20, 2023

OK - tried on the IBM network with both default and jim namespaces - looked good!

jbusche
jbusche previously approved these changes Jun 20, 2023
Copy link
Contributor

@jbusche jbusche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Ted, passing the tests in two namespaces
Thank you!

jbusche
jbusche previously approved these changes Jun 21, 2023
Copy link
Contributor

@jbusche jbusche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Tested in both default and non-default namespaces... looks good.

Maxusmusti
Maxusmusti previously approved these changes Jun 22, 2023
Copy link
Collaborator

@Maxusmusti Maxusmusti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tedhtchang tedhtchang dismissed stale reviews from Maxusmusti and jbusche via deeb62c June 22, 2023 18:25
Copy link
Collaborator

@MichaelClifford MichaelClifford left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Tested PR on cluster with KubeRay v0.5.0 and works as expected.

Copy link
Collaborator

@Maxusmusti Maxusmusti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just two quick comments before I merge

Copy link
Collaborator

@Maxusmusti Maxusmusti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants